5. JASP: Visualization and Correlation

Johnny van Doorn

University of Amsterdam

2025-09-10

In block 1 we aim to:

Discuss NHST concepts
Introduce JASP
Correlation
Linear model (regression)

Reading: Chapters 1-8.8

In this lecture we aim to:

Introduce JASP
- Descriptive statistics
- Data visualizations
Correlation

Reading: Chapters 4, 5, 7

JASP

Why JASP?

User-friendly, intuitive interface
Open-source (free)
Reproducible analyses (easy to share)
Better than IBM’s SPSS for transparency and workflow
Integrates with R for advanced users

Getting Started with JASP

Download from jasp-stats.org
Open JASP, load your dataset (.jasp, .csv, .sav, etc.)
Familiarize with the interface: Data view, Input panel, Output panel

Loading Data in JASP

Click Open > select your file
Data appears in spreadsheet view

Descriptive Statistics in JASP

Go to Descriptives > select variables
Options: mean, median, SD, min, max, quartiles
Output updates instantly

Some special operations

Filtering data
Split variable in Descriptives

Getting help

JASP Video Library
The book

Visualization

Data visualization

Florence Nightingale

Why Visualize Data?

Quick overview of the data
Spot patterns, outliers, or errors
Invite the reader

Why Visualize Data?

Figure 5.15 Catterplot

Different types of graphs in JASP

Distribution plots (basic plots)
- Frequency plot when nominal
- Histogram when scale/ordinal
Correlation plots (basic plots)
Scatter plots (customizable plots)
Raincloud plots

Boxplots: Visualizing Spread and Outliers

Shows median, quartiles, and outliers
Useful for comparing groups

Figure 5.9 Boxplots

Raincloud plots: Boxplots + Raw Data

Figure 5.11 Raincloud plots

Best Practices in Data Visualization

Use clear labels and titles
Avoid unnecessary 3D effects
Choose appropriate scales
Check for misleading representations

Critically Assess Graphs

Are axes labelled (variable, units)?
Are axes broken/equally spaced?
Are error bars (e.g., confidence intervals) included where relevant?

Critically Assess Graphs

Scale Matters

Error Bars!

Correlation

Pearson Correlation

In statistics, the Pearson correlation coefficient, also referred to as the Pearson’s r, Pearson product-moment correlation coefficient (PPMCC) or bivariate correlation, is a measure of the linear correlation between two variables X and Y. It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation. It is widely used in the sciences. It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s.

Source: Wikipedia

Pearson Correlation

\[r_{xy} = \frac{{COV}_{xy}}{S_xS_y}\] Where \(S\) is the standard deviation and \(COV\) is the covariance.

\[{COV}_{xy} = \frac{\sum_{i=1}^N (x_i - \bar{x})(y_i - \bar{y})}{N-1}\]

Plot correlation

\[(x_i - \bar{x})(y_i - \bar{y})\]

Load data

n     <- 10
data <- read.csv("../../../datasets/Album Sales.csv")[, -4]
data <- data[1:n, ] # take the first 10 rows of the album sales data set from Field
DT::datatable(data, rownames = FALSE, options = list(searching = FALSE, scrollY = 415, paging = F, info = F))

Variance

Standardize

\[z = \frac{x_i - \bar{x}}{{sd}_x}\]

z.sales      <- (data$sales   - mean(data$sales))   / sd(data$sales)
z.airplay    <- (data$airplay - mean(data$airplay)) / sd(data$airplay)

Standardize

Covariance

\[{COV}_{xy} = \frac{\sum_{i=1}^N (x_i - \bar{x})(y_i - \bar{y})}{N-1}\]

mean.sales      <- mean(sales, na.rm=TRUE)
mean.airplay    <- mean(airplay, na.rm=TRUE)

delta.sales     <- sales - mean.sales
delta.airplay   <- airplay - mean.airplay

prod <- (sales - mean.sales) * (airplay - mean.airplay)

covariance <- sum(prod) / (N - 1)
covariance

[1] 618.3333

Correlation

\[r_{xy} = \frac{{COV}_{xy}}{S_xS_y}\]

correlation <- covariance / ( sd(sales) * sd(airplay) ); correlation

[1] 0.6864408

correlation

[1] 0.6864408

Correlation

\[r_{xy} = \frac{{COV}_{xy}}{S_xS_y}\] \[{COV}_{xy} = \frac{\sum_{i=1}^N (x_i - \bar{x})(y_i - \bar{y})}{N-1}\]

cor(  sales,   airplay) # correlation

[1] 0.6864408

cor(z.sales, z.airplay) # correlation of z-scores

[1] 0.6864408

# covariance of z-scores
sum(z.sales * z.airplay ) / (N - 1)

[1] 0.6864408

Plot correlation

Significance testing using \(t\)

A test statistic with a known probability distribution (the t-distribution).

It’s a standardized measure of how closely our observed statistic matches the value claimed by \(\mathcal{H}_0\):

\[t = \frac{\text{estimate} - \mathcal{H}_0 \text{ value}}{\text{SE}(\text{estimate})}\]

Significance of a correlation

We convert \(r\) to a \(t\)-statistic because \(t\) has a sampling distribution: the \(t\)-distribution with df = N-2:

\[t_r = \frac{r \sqrt{N-2}}{\sqrt{1 - r^2}}\]

\[{df} = N - 2\]

Hypotheses for \(t\)

\[ \begin{aligned} H_0 &: t_r = 0 \\ H_A &: t_r \neq 0 \\ H_A &: t_r > 0 \\ H_A &: t_r < 0 \\ \end{aligned} \]

\(r\) to \(t\)

df  <- N-2
t.r <- ( correlation*sqrt(df) ) / sqrt(1-correlation^2)

t.r

[1] 2.669948

df

[1] 8

Visualize

Locate in \(t\)-distribution

Closing

Next Week

Partial correlation
The linear model
- Predictions
- Assumptions
- Fun

Recommended Exercises

Exercise 5.1, Exercise 5.5
Exercise 7.1, Exercise 7.6, Exercise 7.7
- Note: pets.jasp can also be downloaded from the Data page

Contact